Glenn Stovall's Public Notebook

Using data effectively

Problem recognition and definition

  1. Define what actions you intend to take and decisions you will make from your findings
  2. Decide if this is a problem data can solve. Sometimes you can't get access to usable data. Other times stakeholders who have the power to take action or make decisions won't listen to the numbers, in which case there is no point in doing data analysis.
  3. Scope the problem. Decide what you want to figure out and what would be noise.
  4. Get specific about what you want to find out.
  5. Identify relevant stakeholders. Will they be persuaded to take action if you present them with relevant data?

Problem definition worksheet

  1. Have you defined a clear problem or opportunity to address what is important to your business or organization?
  2. Have you considered multiple alternative ways to solve the problem?
  3. Have you identified the stakeholders for the problem, and communicated with them extensively about it?
  4. Are you confident that the way you plan to solve the problem will resonate with the stakeholders, and that they will use the results to make a decision?
  5. Are you clear on what decision is to be made - and who will make it - based on the results from your analytics once the problem is solved?
  6. Have you started with a broad definition of the problem, but then narrowed it down to a very specific problem with clear phrasing on the question to be addressed, the data applied to it, and the possible outcomes?
  7. Are you able to describe the type of analytical story that you want to tell in solving this particular problem?
  8. Do you have someone who can help you in solving that particular type of analytical story?
  9. Have you looked systematically to see whether there are previous findings or experiences related to this problem either within or outside your organization?
  10. Have you revised your problem definition based on what you have learned from your review of previous findings?

Stakeholder analysis worksheet

  1. Is it clear what executives have a stake in the success of your project?
  2. Have you been briefed on the problem and the outlines of the solution?
  3. Do they have the ability to provide the necessary resources and bring about the business changes needed to be successful?
  4. Do they generally support the use of analytics and data for decision making?
  5. Does the proposed analytical story and method of communicating coincide with their typical way of thinking and acting?
  6. Do you have a plan for providing regular feedback and interim results to them?

Review previous findings

  1. All previous findings from within your company should be investigated.
  2. Look for external findings, such as books, articles, and previous studies
  3. Use previous findings to help you decide which variables matter.

Data modeling

Data modeling is the art of selecting the variables you'll use in your analysis.

A model is a purposefully simplified representation of the phenomenon or problem. The word peurposefully means that the model is built specifically to solve that particular problem. The word simplified denotes that we need to leave out all the unnecessary and trivial details and isolate the important, the useful, and the crucial features that make a difference - Keeping up with the quants

Any data, regardless of how subjective, can be quantified and modeled. It may or may not be useful.

Can you get better(more unique, more descriptive) data? Focus on getting more data instead of better data. Better data typically beats a better algorithm.

Types of data models

  • univariate - one variable model
  • bivariate - two-variable model
  • multivariate - multi-variable model

Types of variables

  • Binary - true or false
  • Categorical (aka nominal variables): select from a list, for example, flavors of ice cream
  • Ordinal - on a scale, such as a "strongly agree/disagree" Likert scale
  • Numerical - any number, like weight

Balancing data and intuition

Data should never be ignored nor should it be the sole source of information when making a decision. And overreliance on data leads to ignoring factors that aren't as easily quantifiable, such as trust and user experience. It also leads to over-indexing on numbers that can be more easily found and understood. On the other hand, ignoring analytics completely and always going with your gut is nothing but arrogant, willful ignorance.

Presenting your findings

Data modeling, collection, and analysis are useless if people don't read it. How you present your findings is just as important, if not more important than the data you find.

Find creative ways to visually display your data. Adding color and movement can make the data come alive.

Data story archetypes

  • The CSI story - looking for the root cause of a bug/problem("crime") and detective-ing out the root cause("culprit") from a lineup of suspects
  • The Eureka story - Finding a novel solution to a known problem
  • The mad scientist story - Stories centered around experimentation and invention.
  • The survey story - Looks at survey results to tell surprising patterns and findings about the sample population
  • The prediction story - Making predictions about how users will behave. Algorithms in search and social media do this.
  • Here's what happened story - Post mortem

Appendix

Types of analytics

  • statistics - collection, organization, analysis, interpretation, and presentation of data
  • forecasting - estimation of some variable at a future point as a function of past data
  • data mining - looking for patterns in large quantities of data
  • text mining - deriving patterns from text
  • optimization
  • experimental design - use of test and control groups, with random assignment, to elicit the cause and effect relationships in a particular outcome.

Sources of design research

Data science library

  1. Keeping Up with the Quants almost everything in this file is from this book, unless otherwise specified.

Data science antilibrary